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Abstract — The concept of vehicle-to-grid (V2G) has gained 
recent interest as more and more electric vehicles (EVs) are 
put to use. In this paper, we consider a dynamic aggregator- 
EVs system, where an aggregator centrally coordinates a large 
number of dynamic EVs to perform regulation service. We 
propose a Welfare-Maximizing Regulation Allocation (WMRA) 
algorithm for the aggregator to fairly allocate the regulation 
amount among its EVs. Compared to previous works, WMRA 
accommodates a wide spectrum of vital system characteristics, 
including dynamics of EV, limited EV battery size, EV battery 
degradation cost, and the cost of using external energy sources for 
the aggregator. The algorithm operates in real time and does not 
require any prior knowledge of the statistical information of the 
system. Theoretically, we demonstrate that WMRA is away from 
the optimum by 0(1/V), where V is a controlling parameter 
depending on EVs battery size. In addition, our simulation 
results indicate that WMRA can substantially outperform a 
suboptimal greedy algorithm. 

Index Terms — Aggregator-EVs system; electric vehicles; real- 
time algorithm; V2G; welfare-maximizing regulation allocation. 



I. Introduction 

Electrification of personal transportation is expected to 
become prevalent in the near future. For example, millions of 
electric vehicles (EVs) will be operated in the United States by 
2015 Q"). Besides serving the purpose of transportation, EVs 
can also be used as distributed electricity generation/storage 
devices when plugged-in Q. Hence, the concept of vehicle- 
to-grid (V2G), referring to the integration of EVs with the 
power grid, has received increasing attention 02], |]3]. 

Frequency regulation is a service to maintain the balance 
between power generation and load demand, which is vital to 
maintain the frequency of power grid at its nominal value. 
Traditionally, regulation service is achieved by turning on 
or off fast responsive generators and is the most expensive 
ancillary service J4|. Experiments show that EVs power 
electronics and battery can well respond to the frequent 
regulation signal. Thus it is possible to exploit plugged-in 
EV as a promising alternative to provide regulation service 
through charging/discharging, which potentially could reduce 
the cost of regulation service significantly J5). However, since 
the regulation service is generally requested on the order 
of megawatts (MWs) while the power capacity of an EV 
is typically 5-20kW, it is often necessary for an aggregator 
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to coordinate a large number of EVs to provide regulation 
service J6). In addition, frequent charging/discharging has 
direct effect on EVs battery life. Thus, it is important to design 
proper algorithm for regulation allocation in the aggregator- 
EVs system, especially in a real-time fashion. 

There is a growing body of recent works on V2G regulation 
service. Specific to the aggregator-EVs system, which focuses 
on the interaction between the aggregator and EVs, centralized 
regulation allocation is studied in iTTI — ifTTl . where the objective 
is to maximize the profit of the aggregator or the EVs. In 
(7), a set of schemes based on different criteria of fairness 
among EVs are provided. In J8), the regulation allocation 
problem is formulated as quadratic programming. In [0, 
considering both regulation service and spinning reserves , the 
underlying problem is formulated as linear programming. In 
iflOl . the charging behaviour of EVs is also considered, so 
that the problem is then reduced to the control of the charging 
sequence and the charging rate of each EV, which is solved by 
dynamic programming. In IfTTl . a real-time regulation control 
algorithm is proposed by formulating the problem as a Markov 
decision process, with the action space consisting of charging, 
discharging, and regulation. Finally, a distributed regulation 
allocation system is proposed in 1 1 2| using game theory, and 
a smart pricing policy is developed to incentivize EVs. 

In addressing the regulation allocation problem, however, 
these earlier works have omitted to consider some essential 
characteristics of the aggregator-EVs system. For example, 
deterministic model is used in |!7j and [|T0l, which ignore 
the uncertainty of the system, e.g., the uncertainty of the 
electricity prices. The dynamics of the regulation signals is 
not incorporated in lfL2l . nor the energy restriction of EV 
battery is considered. The self-charging/discharging activities 
in support of EVs own need are omitted in [7] and |[T2l . 
The potential cost of using external energy sources for the 
aggregator to accomplish regulation service is ignored in flT]- 
ifTTIl . and the cost of EV battery degradation due to frequent 
charing/discharging in performing regulation service is not 
considered in |8], fl0l-fl2l. 

In this work, we consider all of the above factors in a 
more complete aggregator-EVs system model, and develop a 
real-time algorithm for the aggregator to fairly allocate the 
regulation amount among the EVs. Specifically, considering an 
aggregator-EVs system providing long-term regulation service 
to a power grid, we aim to maximize the long-term social 
welfare of the aggregator-EVs system, under the long-term 
constraint on the battery degradation cost of each EV. To solve 
such a stochastic optimization problem, we adopt Lyapunov 
optimization technique, which is also used in |[T3ll - lfT5l for 
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demand side management in smart grid. We demonstrate how 
a solution to this maximization can be formulated under a 
general Lyapunov optimization framework lfT6l . and propose 
a real-time allocation strategy specific to the aggregator- 
EVs system. The resultant Welfare-Maximizing Regulation 
Allocation (WMRA) algorithm does not require any statistical 
information of the system, and is shown to be asymptotically 
close to the optimum as EV's battery capacity increases. 
Finally, WMRA is compared to a greedy algorithm through 
simulation and is shown to offer substantial performance gains. 

In our preliminary version of this work ifPTl . the EVs are 
ideally assumed to be static, i.e., they are in the aggregator- 
EVs system throughout the operational time. In this paper, 
to more realistically capture the dynamics of the aggregator- 
EVs system, we generalize the system model in IfPTl to 
accommodate dynamic EVs, which is considered in none of 
the previous works ll7l- |[l"2l . This generalization is challenging 
for the centralized control of regulation allocation, since the 
returning EV may have a different energy state compared to 
that when it leaves the system, and this energy difference will 
impose much more difficulties on the aggregator to handle 
EV's battery energy restriction under regulation service. 

The rest of this paper is organized as follows. We describe 
the system model and formulate the regulation allocation 
problem in Section HI] In Section [TTH we propose WMRA, and 
in Section HVl we analyze its performance. Simulation results 
are exhibited in Section [V] and we conclude in Section IvTl 

Notation: Denote [a] + as max[a, 0], [a, b] + as max [a, b], 
and [a, b]~ as min[a, b]. 

II. System Model and Problem Formulation 

In this section, we propose a centralized dynamic 
aggregator-EVs system and formulate the regulation allocation 
problem mathematically. 

A. Aggregator-EVs System and Regulation Service 

Consider a time-slotted system with the time set T = 
{0, 1,---}, where the regulation service is performed over 
equal time intervals of length At. At the beginning of each 
time slot t, the aggregator receives a random regulation signal 
Gt from the power grid. If Gt > then the aggregator needs 
to perform regulation down by absorbing Gt units of energy 
from the power grid during time slot t, and if Gt < then 
the aggregator needs to perform regulation up by contributing 
\Gt \ units of energy to the power grid during time slot t. 

The aggregator coordinates N registered EVs to perform 
regulation service and can communicate with each EV bi- 
directionally when the EV is plugged-in. Assume that each 
EV can leave and re-join the system infinite times. For the i- 
th EV, denote £ T as its fc-th returning time slot and 
Ul,k £ T as its k-th leaving time slot with ti r k < Ul,k> 
V/c £ {1, 2, • • • }. Particularly, if the i-th EV is in the system 
at t = 0, let ti r> i = 0. Define the set of the returning time 
slots for the i-th EV as %,r={tir,ii Ur,2, • ■ ' }, an d me set of 
the leaving time slots as 7i,;={iij,i, Ui t 2, •••}. Define 



as the total number of the times that the i-th EV leaves the 
system until time slot t. If such k does not exist, let rij(i) = 0. 
Hence, from ([TJ, we have < n, (t) < t. From the assumption 
that the EV can leave and re-join the system infinite times, we 
have lim^oo rii(t) — oo. Define 

%,p = {tir,k> tir,k + 1) ' ' ' ! til,k+l — 1} 

as the set containing all participating time slots of the i-th EV. 
In other words, the i-th EV is in the system for any t £ %, p - 

Define l i t = \ rf * G ^ p , and l t =[l M , • • • , l N t \- 
I 0, otherwise 

At the beginning of each time slot, the aggregator allocates 
the required regulation energy amount \Gt \ among all present 
EVs. Denote Xid,t > as the amount of regulation down 
energy allocated to the i-th EV, and x; M t > as the amount 
of regulation up energy contributed by the i-th EV. Due to 
charging/discharging circuit limitation, assume that Xid.t and 
Xi U .t are upper bounded by Xi liaax > 0. If the i-th EV is not 
in the system at time slot t, then we have Xid,t = %iu,t = 0. 
Define ■x.d,t=[x\d,t, ■ ■ • , x Nd . t ] and x„ it =[a:i„, t , • • ■ , x Nu . t ]. 

Assume that the i-th EV is in the system at some t £ Ti,p- 
Denote s,,* as its energy state at the beginning of time slot 
t, restricted by < s^t < Si jCap , where Si jCap is the battery 
capacity of the EV. After regulation service at time slot t, the 
energy state of the i-th EV at time slot t + 1 is given by 



s i,t+l — Si,t + *-d,t%id,t — lu,t%iu,t 



+ hu (2) 



A Jl, ifG t >0 A Jl, ifG t <0 
where ±dt=\ , l ui = < , and 

I 0, otherwise I 0, otherwise 

h.t^d.tXid.t - ^-u.tXiu.t- Note that ld, t l«,t = 0,Vi, since 
regulation down and up services cannot happen at the same 
time. Charging a battery to near its capacity or discharging it to 
close to the zero energy state can significantly reduce battery's 
lifetime |[T8l . Therefore, lower and upper bounds on the battery 
energy state are usually imposed by its manufacturer or user. 
Denote [sj im i n! Sj,max] as me preferred energy range of the i- 



th EV with < Si 



< Si 



< S; 



By such constraint, 



the resultant energy state at time slot t + 1 should satisfy 
which means that the allocated regulation amounts Xid,t and 
Xiu.t for the i-th EV should satisfy < x%d,t < hid,u an d 
< Xi U .t < hiu,t, respectively, where 



Sit 



id,t~ 



and 



. Sit 



if U,t = 1 
otherwise 

if U,t = 1 
otherwise. 



rii(t)= max 



{k:t a ,k <t,k£ {1,2,- •• ,}} 



(1) 



From time to time, the i-th EV may need to stop its 
regulation service and leave the system for personal reason 
or self-charging/discharging purposes. When the EV is out 
of the system, it cannot perform regulation service and the 
aggregator has no information of its energy state. When 
returning, the EV may have a different energy state compared 
to its last leaving energy state. Assume that all returning 
energy states of the i-th EV lie in the preferred energy range, 
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Further, define 



A 



i,k~ 



n,tu,k 



, Vi 6 7i, r > by the EV's self-control. 



«i,tir,* + i.V*e{i,2,»-} 



as the difference between the EV's fc-th leaving energy state 
and (k + l)-th returning energy state, and assume that 



lim E 

K— >oo 



1 K 
K f-^ 



i.k 



k=l 



It can be shown that such condition holds if Aj^ is i.i.d. 
with mean zero and finite variance, which is a rather mild 
assumption considering the random behaviour of each EV. 

For each EV, the regulation service gain comes at the cost 
of battery degradation due to frequent charging/discharging 
activities. Denote Ci(x) as the degradation cost function of 
the regulation amount x for the z-th EV, with < Ci(x) < 
Cj,max and Ci(0) = 0. Since faster charging or discharging, 
i.e., larger value of Xid,t or Xi U> u has a more detrimental 
effect on the battery's lifetime, we assume Ci(x) to be 
convex, continuous, and non-decreasing. We further assume 
that each EV imposes an upper bound Cj iUp , < Ci. up < 
Ci. max, on the time-averaged battery degradation, expressed by 
lim^oo ^ Y^t^O E [idjCiiXidj) + lu,tCi{x iUtt )] < c JjUp . 

Finally, the total regulation amount provided by the EVs 
may not be sufficient to serve the requested regulation amount. 
For brevity, define 

as the regulation amount allocated to the i-th EV at time 
slot t. Then, the insufficiency of regulation amount means 
that Yli=i x i,t < I Gt| for regulation down or up. This could 
be due to, for example, a lack of participating EVs, or high 
cost of battery degradation. The gap between Y2iLi x i-t an d 
\Gt\ represents an energy surplus in the case of regulation 
down, or an energy deficit in the case of regulation up. Such 
surplus or deficit must be cleared, or the regulation service 
fails. Therefore, from time to time, the aggregator may need 
to exploit more expensive external energy sources, such as 
from the traditional regulation market. Denote the unit costs 
of clearing energy surplus and energy deficit at time slot t as 
e Sj t and ed,t, respectively, which are both random but restricted 
in [e m i n ,e max ]. Then, the cost for the aggregator at time slot 
t is 



N 
i=l 



N 

lu,ted,t(\G t \ - y\ 



B. Fair Regulation Allocation through Welfare Maximization 

The objective of the aggregator is to maximize the long- 
term social welfare of the aggregator-EVs system, i.e., to 
fairly allocate the regulation amount among the EVs, while 
respecting EVs' battery degradation constraints and reducing 
the need to utilize the expensive external energy sources. To 
this end, we formulate the regulation allocation problem as the 



following stochastic optimization problem: 
PI: 



™S X u * u ( r X! E ^ K'tr^ E[et] 



Xi(,x u i' — ' I T-too T 

i=l \ t=0 

s.t. < x id ,t < h ic ij,Vi, 
< x iUtt < hi U>t ,Vi, 

N 

xidA < id,tGt, 



t=0 



(3) 
(4) 



(5) 

^ Xiu,t < lu,t|G t |, (6) 

i=l 

1 T_1 

lim — > E \ld,tC l (x id t) + l u ,tCi( 

T— >oo 1 ' — ' 



i=l 



t=0 



(7) 



where u>i > is the normalized weight associated with the 
i-th EV, and U(-) is a utility function assumed to be con- 
cave, continuous, and non-decreasing, with a domain bounded 
within [0,Xi jmax ],Vi. Furthermore, to facilitate later analysis, 
we make a mild assumption that the utility function [/(•) 
satisfies 



U{x) < U(0)+iix,Vx G 



0, max {Xj m ax 

Ki<N 



(8) 



where /i > 0. One sufficient condition for <[8J to hold is 
that U(-) has finite positive derivate at zero, such as U(x) = 
\og(l+x). The expectations in the above optimization problem 
are taken over the randomness of the system inputs. 

Remarks: In the objective function of PI, the first term 
considers each EV's welfare under the utility function {/(■) and 
the weight Wj, and the second term reflects the aggregator's 
cost, which is affected by the regulation amounts of all EVs. In 
® and (01, for each EV, hard constraint is set for the regulation 
amount at each time slot, while in @, long-term average 
constraint on the battery degradation cost due to regulation 
allocation is set. Note that constraints (0 and (O ensure that 
Xid.t = for regulation up service (G t < 0), and Xi Ul t = 
for regulation down service (Gt > 0). These two constraints 
couple the regulation amounts of all EVs. 

III. Welfare-Maximizing Regulation Allocation 

In this section, we first apply a sequence of two reformu- 
lations to PI, then propose a real-time welfare-maximizing 
regulation allocation (WMRA) algorithm to solve the resul- 
tant optimization problem. The performance analysis of the 
proposed WMRA will be shown in Section IIVI 



A. Problem Transformation 

The objective of PI contains a function of long-term av- 
erage, which complicates the problem. However, in general, 
such a problem can be converted to a problem of maximizing 
a long-term average of the function lfl6) . Specifically, we 
transform PI as follows. 
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We first introduce an auxiliary TV-dimensional vector 
z t=[zi,t, ■ ■ ■ , ZN,t) with the constraints 



< Zi, t < Xi 



, Vi, and 



(9) 



^ T-l ^ T-l 

lim -VEkf]= lim - V Efe t l, Vi (10) 

i=0 t=0 

From the above constraints, the auxiliary variable Zi t and the 
regulation allocation amount Xi,t are within the same range 
and have the same long-term average behaviour. We now 
consider the following problem. 
P2: 



1 



T-l 



max 

x t j 1 t,x U|t ,zt 
S.t. 



lim - VE 

T->oc T ^ 
t=0 



u)iU(zi, t )j - e t 
©, O, ©, ©, ©, ®, and (HOJ 



Compared to PI, the optimization in P2 is over x^f, x u t 
and z f with two more constraints ((9) and (fTOt . Note that P2 
contains no function of time average; instead, it maximizes a 
long-term time average of the expected social welfare. 

Denote (x /^, x° pt t ) as an optimal solution to PI, and 
(x^ + , x„, ^ , z 



dt ,^ ut ,£j t j as an optimal solution to P2. Define 
opt l with 



-optAr^opt 
£ t — l z l,t> 



' "JV,tJ 



T-l 



-opt 



lim 

T^oo T 



V*. 



t=o 



Denote the objective functions of PI and P2 as /i(-) and /2(-)> 
respectivley. The equivalence of PI and P2 is stated below. 

Lemma 1: PI and P2 have the same optimal objec- 
tive, i.e., /i(x° p J,x° pt t ) = / 2 (x2 t ,x; t ,z*). Furthermore, 
(xfl, x° pt t , z° pt ) is an optimal solution to P2, and (x^ t , x* t ) 
is an optimal solution to PI. 

Proof: The proof follows the general framework given in 
|[T6l . Details specific to our system are given in Appendix lAl 

■ 

Lemma [T] indicates that the transformation from PI to P2 
results in no loss of optimality. Thus, in the following, we will 
focus on solving P2 instead. 

B. Problem Relaxation 

P2 is still a challenging problem to solve since in constraints 
(01 and ©, the regulation allocation amount of each EV 
may depend on its current energy state Si,t, which couples 
with all previous regulation allocation amounts. To avoid such 
coupling, we relax the constraints of Xid,t and Xi u ,t and 
introduce P3 below. 
P3: 



1 



T-l 



max 

x d t ,x„ t ,z t 



T 



lim - V E 

t=0 



N 



wjU(zi,t) 



s.t. < Xid,t < U,tXi .max ; Vi, 



^ Xi u ^t ^ lj^X^j 
T-l 



, Vi, 



J^^E^M =0,Vi, 



(11) 
(12) 

(13) 



t=o 



©,©,(0),©, and (HUD, 



where in (fT3l fe, t is defined below ©. In P3, we have replaced 
the constraints (O and (|4|i in P2 with (TTTT> — (TT31>. thus have 
removed the dependency on the current energy state Sn- We 
next demonstrate that, any (x,j, t ,x„, t ) that meets (O and © 
also satisfies (TTTb— (fT~3T>. i.e., any feasible solution of P2 is also 
feasible for P3. 

Considering the i-th EV, the constraints © and <j4j in P2 
are equivalent to the following two sub-constraints: 
if In = 1> then 



^i.min 



^ Xid,t ^i, max 
— — ^i, max? 



if 1, t = 0, then 



Xid,t 



0. 



(14) 
(15) 
(16) 



(17) 



Since s^j is also bounded for any returning time slot t G 7i, r , 
together with (O, we have < s ijt < s iimax ,W £ 7I P U 

71,;. Note that (fl4l . ( fT5T l, and (|T7J> imply dTTJ and ([T2]i, so we 
are left to justify that the boundedness of s^t implies ( fT3l ). 
Lemma 2: For the i-th EV, assume that 



limx- 



< 



K 

SLt < 



i sr K 



.Vf 



and lim^oo n,(i) = oo. If 
G 71, P U 77,;, then the constraint 



( IT3T > holds, /.e., lim T -s.oo y >~2t=o 

Proof: See Appendix iBl ■ 

From Lemma [2] we know that, the boundedness of s^t 
indeed implies (fT3l ), which completes our demonstration that 
P3 is a relaxed version of P2 with larger feasible solution set. 
We will later show that our proposed real-time algorithm for 
P3 ensures (0 and are always satisfied, and thus provides 
a feasible solution to P2 and to the original problem PI. 

The relaxed problem P3 allows us to apply Lyapunov op- 
timization to design a real-time algorithm for solving welfare 
maximization. This relaxation technique to accommodate the 
type of time-coupled action constraints such as (0 and (0 is 
first introduced in |fl9l for a power-cost minimization problem 
in data centers equipped with stored energy. Unlike in |[T9l . 
the structure of our problem is more complicated, where the 
dynamics of the distributed storages (EVs) are considered 
as well as a nonlinear objective which allows both positive 
and negative values for the energy requirement G t . Thus, the 
algorithm design is more involved to ensure that the original 
constraints in P2 are satisfied. 

C. WMRA Algorithm 

In this subsection, We propose a WMRA algorithm to solve 
P3 by employing Lyapunov optimization technique. 

We first define three virtual queues for each EV with the as- 
sociated queue backlogs J^t, Hit, and Ki t- The evolutionary 
behaviours of J;, t , H^ t , and Kit,Vi, are as follows. 

Ji,t+i = [Ji,t + ld,tCi(xid,t) + lu,tCi(x iUit ) - c;, U p] + . (18) 
H iit+ i = Hi.t + Zi,t ~ x%,t- (19) 

if t G 71 r 



A" 



Ki t-i + bi t_i, otherwise, 



(20) 
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where in ( f2Qb we design the constant 



with V £ fO, V m 



and 



: mm - 

Ki<JV I 



4x,; 



2X,; 



2(o>»/i ■ 



(21) 



The role of V will be explained later. It will also be clear that 
the specific expressions of Cj and V max are in fact to ensure 
the boundedness of the energy state Sj Note that Xi :Vnax is 
generally much smaller than the energy capacity. For example, 
for Tesla Model S [|20| , the energy capacity is 40k Wh, and 
£i,max = 0.83kWh if the maximum charging rate is applied 
and the regulation duration is 5 minutes. Therefore, generally 
we always have V ma , x > 0. 

From (|20| |. K^t is re-initialized as a shifted version of Si t 
every time the i-th EV returning to the aggregator-EVs system; 
also, Kn has the same evolutionary behaviour as s,. t for t £ 
7i, p U7i,z. Therefore, (l20b implies that Ki t = s i,t —<k> Vt £ 
7i, p U7i,z. In addition, since &,. t = when 1$ j = , we have 

and Vfc £ {1,2,---}. 

By introducing the virtual queues, the constraints ([7) and 
([Tol l hold if the queues J,; t and Hi >t are mean rate stable, 
respectively 0161 . Below we give the definition of mean rate 
stability of a queue. 

Definition: A discrete time process Q(t) is mean rate stable 

i fu m ^ 00 M)a = o. 

Unlike J,. t and H^t, since Ki.t is re-initialized when t £ 



% r , a new virtual queue is essentially created every time the i- 
th EV re-joining the system. Therefore, the mean rate stability 
of K i t is n °t sufficient for the constraint ([T3T l to hold, and 
stronger condition is required. For now, since K^ t is just a 
shifted version of s^t for t £ %. p U from Lemma [2] the 
following lemma is straightforward. 

Lemma 3: For the i-th EV, assume that 
lim 



K- 



K l^k=l ^ 

- Ci < Kit < 



and limt^oo rii{i) = oo. If 
Ci,Vt £ 77,p U 71,/, then the 
constraint (fT3T l holds, /.e., lim/r^oo y X^t^o 1 ^[&i,t] = 0. 

Later we will show that by our proposed algorithm, the 
boundedness condition of K^t in Lemma[3]can be guaranteed. 

Define Jt=[Ji,t, ■ ■ ■ , Jjv,t], H t =[Hi,t, ■ • • , H N ,t), 
K t ±[K ltt ,--- ,K N , t ], and ®t=[H t , J t , K t ] with 
the initial value ©o = 0. Define the Lyapunov 

EiltOff*, + Jit + Kit), 

associated one-slot Lyapunov drift as 
[i(0 f+ i) - L(® t )\®t] ■ Let the drift-minus- 



function 
and the 

A(0 t )=IE 



welfare function be A(0 t ) — VE J2i=i w i^( z i,t) — et\®t 
where V £ [0, Vmax] is the weight associated with the welfare 
objective. Therefore, the larger V, the more weight is put on 
the welfare objective in the drift-minus- welfare function. We 
give an upper bound on the drift-minus-welfare function in 
the following proposition. 

Proposition 1: The drift-minus-welfare function is upper- 



bounded as 



A(0 t ) - VE 



N 



^2uiU{z itt ) - e t \®t 

.4=1 

N N 

< B + J2 K\ t E[b^ t \® t ] + ]T H ijt E[zi, t - Xi, t \e t ] 

i=l i—l 

N 

+ ^ "^M E [~*-d,tCi{Xid,t) + lu,tCi(x iUit ) ~ Ci )Up \@t\ 



VE 



N 



(-t 



i=l 



(22) 



where B=\ Y^=i [<max + l x lm^ c l] + +[c? up , (Ci, max - 

c 4 , llp ) 2 ]+] and Ve [0,V max ]. 

Proof: See Appendix ICl ■ 
We now propose the WMRA algorithm that minimizes the 
upper bound on the drift-minus-welfare function in (l2Zt at 
each time slot. This is equivalent to solving the following 
decoupled sub-problems with respect to z t , x d ,u and x Mjt , 
separately. Denote the auxiliary vector and the allocated 
regulation down and up energy amount vectors produced by 
WMRA as 7, t =[zx it , ■ ■ ■ ,Zjv,t], Xd,t—[xu,t, • ■ ■ ,%Nd,t], and 
Xu,t—[iiu,t, ■ ■ ■ ,5jv«,i]> respectively. Specifically, we obtain 
Zi,t,Vi, by solving (a): 

(a): min H itt z^t - UiVU(z itt ) s.t. < z ijt < x itiaax . 

Zi,t 

For Gt > 0, we obtain x^f by solving (bl): 

N N 

(bl): min Ve st (G t - y^x id ,t) - ^Hi,tXid,t 



N 



N 



■ ^ Ji,tCi(Xid t t) + 22 Ki, tXid.: 



i=l 



S.t. < Xid.t < li.tX 



i=l 
N 



id,t -i ^-ijt^i, maxj 



For Gt < 0, we obtain x u t by solving (b2): 

JV N 

(b2): min 

Ve d ,t{\G t \-J2 x ™,t) ~J2 H ^ 

Xiu,t 

i=l i=l 
N N 

+ Ji,tCi(Xi u j) — Ki,tXiu,t 
i=l i=l 
JV 

S.t. < Xi Ul t < li,t%i,max: ^ ' Xju,t _• \Gt\ 



Note that (a), (bl), and (b2) are all convex problems, so 
they can be efficiently solved using standard methods such as 
the interior point method and the Lagrange dual method lETI . 
We summarize WMRA in Algorithm 1. Note from Steps (2b) 
and (2c) that, the solutions of (a) and (bl) (or (b2)) affect each 
other over multiple time slots through the update of Hi t t,Vi. 
To perform WMRA, no statistical information of the system 
is needed, which makes the algorithm easy to implement. 
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Algorithm 1 Welfare-Maximizing Regulation Allocation 
(WMRA) Algorithm. 
1: The aggregator initializes the virtual queue vector ©o = 

0, and re-initialize Ki.t = Si.t — c% for t 6 7l <r , Vi. 
2: At the beginning of each time slot t, the aggregator 
performs the following steps sequentially. 
(2a) Observe Gt, e s t , ed,t, It (if It cannot be predicted), 

J t , H 4 , and K t . 
(2b) Solve (a) and record an optimal solution z t . If Gt > 
0, solve (bl) and record an optimal solution Sid,t- If 
Gt < 0, solve (b2) and record an optimal solution 
x Uj t. Allocate the regulation amounts based on Xd,t 
and x„ it . If X) i= i ^id,t < G t or ^^Ii i m ,t < \G t \, 
then clear the imbalance using the external energy. 
(2c) Update the virtual queues J it , H it , and K it ,\/i, 
based on (fT8l), (fT9l), and d20ll, respectively. 



IV. Performance Analysis 

In this section, we characterize the performance of WMRA 
with respect to our original problem PI. 

A. Properties of WMRA Algorithm 

We now show that WMRA can ensure the boundedness of 
each EV's energy state. The following lemma characterizes 
sufficient conditions under which the solution of Xid.t and Xi U ,t 
under WMRA is zero. 

Lemma 4: Under the WMRA algorithm, for any t G 7I lP , 

1) for G t > 0, if K l . t > x iiiaax + V(uj l fi + e max ), then 
Xid,t = 0, which means that -ft^t+i cannot be increased 
at the next time slot; and 

2) for G t < 0, if K i>t < -x i>max - V(uJifi + e max ), then 
Xi U ,t = 0, which means that Kj+i cannot be decreased 
at the next time slot. 

Proof: See Appendix iDl ■ 
Since Lemma|4]on the other hand provides conditions under 
which queue backlog K t t can no longer increase or decrease, 
using Lemma|4] we can prove the boundedness of Ki t below. 

Lemma 5: Under the WMRA algorithm, queue backlog 
Ki t associated with the i-th EV is bounded by s^min — <k < 
Ki, t < s 

{.max 

c 4 ,Vt e Ti <p uTi,i. 

Proof: Consider the set {£«-,&, t% r ,k + 1, ■ ■ • , iu,k\ for any 
k G {1, 2, • • • }. We show below that Ki t is bounded for any 
t in such set by induction. 

First consider the upper bound. For the time slot ti r ^, based 
on (ffOll and s iMr , k < Si, max , there is K iiUr k < s i>max - a. 
Assume that the upper bound holds for time slot t and consider 
the following two cases of Kn- 

Case 1: x i<max + V{wiH + e max ) < K i<t < Si, ma x - c% (We 
can check that a; iimax + V(u l [i + e max ) < s^ max - c t since 
V < Vmax). For G t > 0, from Lemma|4]l), there is xu t t = 0. 
Therefore, Kij+i = K^t < Si.max — Ci. For Gt < 0, we have 

KiJ+l = Ki tt - Xi Ut t < Kij < Sianax ~ Cj. 

Case 2: K itt < a; iimax + V(uiifi + e max ). From (|20]l, 
Ki >t +i < 2.T l;lnax + V(u)iH + e max ) < s i;max - Ci, where 
the last inequality holds since V < V max . 



Now look at the lower bound. For the time slot U rt h, based 
on (EOJ and s tMr k > s i>min , there is K iiUr k > s i>min - q. 
Assume that the lower bound holds for time slot t and consider 
the following two cases of Ki yt . 

Case 1': s iimin - q < K i)t < -.T,, max - Vfan + e max ) 
(We can check that s i>min - c t < -.T,, max - V(u)i(i + e max ) 
since Xi, max > 0). For Gt < 0, from Lemma [4] 2), there is 
x iu ,t = 0. Therefore, K itt +i = K iit > s ijm \ n -Ci, For G t > 0, 
we have K itt +i = K i>t + x id .t > K i>t > s iimin - c, . 

Case 2': K iyt > ~x itmax - V(u)ifi + e max ). From (f20]l, 
Ki tt +i > -2x itmax — V(uJi/J,+e max ), which is exactly s i<m i a - 

ci. m 

Remarks: To track the energy state s^t, in principle, the 
shift Ci can be any number. However, to ensure the bound- 
edness of Kij, the form of C{ is uniquely determined from 
the proof of Case 2'. For the design of Vm ax , to make 
the proof in Case 1 work, it is sufficient to let Vm ax = 

mini<.i<jv | a '' maX 2(^;'p+r^rxT aX ~ £ } Where 6 > can be 
arbitrarily small. This e is further determined as max based 
on the proof in Case 2. 

Note that Lemma [5] is a sample path result. Therefore, it 
is true regardless of the statistics of the system. In addition, 
note that since the boundedness condition of Ki t in Lemma 
[3] is now satisfied, the conclusion there is true under WMRA. 
Recall that K i t = *— Cj for any t G Ti p UTu- Using Lemma 
[5] the following lemma is straightforward. 

Lemma 6: Under the WMRA algorithm, the energy state 
of the i-th EV is bounded by Si. m ; n < Si t t < Si. max ,Vt G 
7i, P U 71/. 

Hence, from Lemma [6] the constraints (fJJ and (0) in P2 are 
met under WMRA. 

B. Optimality of WMRA Algorithm 

In this subsection, we investigate the optimality of WMRA 
by considering EVs with both predictable and random dynam- 
ics, which are described below. 

1) EVs with predictable dynamics: Predictable dynamics 
could happen when each EV joins and leaves the 
aggregator-EVs system regularly (e.g. from 9am to 12pm 
in the morning, then from 2pm to 6pm in the afternoon). 
Therefore, an EV's leaving and returning time slots 
can be predicted by the aggregator, which means that, 
the aggregator is aware of the realization of lt,V£, in 
advance, and it does not have to observe l t every time 
slot. In this case, the system state at time slot t is defined 
as A t ={Gt,e St t,e d ,t)- 

2) EVs with random dynamics: If the EVs do not partic- 
ipate in the aggregator-EVs system regularly, then the 
aggregator cannot predict their dynamics beforehand, 
and therefore, has to observe l t every time slot. In 
this case, the system state at time slot t is defined as 
A t —{Gt,e s ,uZd,t, It)- 

Note that the WMRA algorithm is the same under both of 
the above cases. The only difference between them is that, in 
the optimization problem P3, the expectations are taken over 
different randomness of the system state. The performance 
under WMRA as compared to the optimal solution of PI 
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is given in the following theorem, which applies to both 
predictable and random dynamics. 

Theorem 1: Given the system state A t is i.i.d. over time, 

1) (x<j,t,x Ul i) is feasible for PI, i.e., it satisfies (OJ, ©, (0), 
©, and ©. 



TABLE I 

Parameters for Type I and Type II EVs 



2) /i (x rfit , x„. t ) > /i (x° p j , x° p ' ) 



where 



E 



A' 



V • 



b 2 c 2 l+ 



^z,up 



^2 ,up 



)2 ] + 



and V G [0,F max ]. 

Proof: See Appendix [E] ■ 
Remarks: From Theorem Q] the welfare performance of 
WMRA is away from the optimum by 0(1/V). Hence, the 
larger V, the better the performance of WMRA. However, in 
practice, due to the boundedness condition of EVs battery 
capacity, V cannot be arbitrarily large and is upper bounded 
by Vmax, which is defined in OTT) . Note that V max increases 
with the smallest span of the EVs' preferred battery capacity 



ranges, i.e., mini< l <j V {s 



j}. Therefore, roughly 



speaking, the performance gap between WMRA and the 
optimum decreases as the smallest battery capacity increases. 
Asymptotically, as the EVs' battery capacities go to infinity, 
WMRA would achieve exactly the optimum. 

In the Theorem Q] the i.i.d. condition of A t can be relaxed 
to Markovian over time, and a similar performance bound can 
be obtained. 

Theorem 2: Given that the system state A t evolves based 
on a finite state irreducible and aperiodic Markov chain, 

1) (5i.d,t, x M .t) is feasible for PI, i.e., it satisfies ©, (|4]i, (0, 
©,'and ©. 

2) /i(x djt ,x u , 4 ) > /i(x^,x° pt t ) - 0(1/V), where V € 
[0,V max ]. 

Proof: The above results can be proved by expanding the 
proof of Theorem[T]using a multi-slot drift technique [16 |. We 
omit the proof here for brevity. ■ 

V. Simulation Results 

In this section, we simulate an aggregator-EVs system 
with parameters drawn from practical scenarios, and compare 
WMRA with a suboptimal greedy algorithm. 

Suppose that the aggregator is connected with N = 100 
EVs, evenly split into Type I (Ford Focus Electric) and Type II 
(Tesla Model S). The parameters of Type I and Type II EVs are 
summarized in Table I 11201 . 11221 . where 2^ max is derived by 
assuming the regulation interval At = 5 minutes. For example, 
in New England, New York, and Ontario, the common At 
is 5 minutes. Consider that each EV has random dynamics, 
and the system state A t = (Gt, e s . t , e^.t, It) follows a finite 
state irreducible and aperiodic Markov chain. Specifically, at 
each time slot, the regulation energy amount Gt is drawn 
uniformly from a discrete set {—69.2, —69.2 + Ai, —69.2 + 
2Ai, • • • , 69.2} (kWh) with cardinality 200, where 69.2 kWh 
is the maximum allowed energy amount at each time slot if all 
N EVs are in the system. The unit costs of external sources, 
i.e., e S) t and e<j.t, are drawn uniformly from a discrete set 
{0.1,o!l + A 2 ,6.1 + 2A 2 ,-- - ,0.12] (dollars/kWh) with car- 
dinality 200. The indicator random variable I4 1 , i.e., whether 
the i-th EV is in the system, follows a 2-state Markov chain as 





Type I EV 


Type II EV 


Si.eap (kWh) 


23 


40 


Ii,m« (kWh) 


0.55 


0.83 




Fig. 1. Transition probabilities of 1; t ,Vi. 



shown in Fig. Q] The returning energy state Si j ir k+1 of each 
EV is drawn uniformly from [s;^, k — A3, s,;. f u k + A3] with 
A 3 = 5%Si iCap , and is guaranteed to be within [sj im i n , Sj >max ]. 
We set s^min = 0.1s^ ca p and s ijmax = 0.9s^ cap except 
otherwise mentioned. In the objective function of PI, we set 
U{x) = log(l + x) and Ui = 1, Vi. The battery degradation 
cost function of each EV is Ci(x) = x 2 , and the upper bound 
Q )Up is set to be a;| max /4. 

To allocate the requested regulation amount, we apply 
WMRA in Algorithm 1 at each time slot. The simulation is 
performed over 1000 time slots. The social welfare at time slot 
t is considered as the objective function of PI with T = r. 
For comparison, we consider a greedy algorithm which only 
optimizes the system performance at the current time slot. 
Thus, its regulation allocation at each time slot is derived from 
the following optimization problem. 

©, and 



max 

Xd, t ,x„, t 



S.t. 



The above problem is a convex optimization problem, and we 
use the standard solver in MATLAB to obtain its solution. 

In Figs. [2] and [3] we compare the performance of WMRA 
with V = V max and the performance of the greedy al- 
gorithm. From Fig. [2] with s, iinax = 0.9si iCap , WMRA is 
uniformly superior to the greedy algorithm for all times, 
and the advantage is about 28%. In Fig. [3j we vary s, iinax 
from 0.3sj jCap to 0.9si iCap . The observations are as follows. 
First, WMRA uniformly outperforms the greedy algorithm 
over different values of Si. max - Second, as Si <max increases, 
the social welfare under WMRA keeps on increasing. This 
is because increasing Si. max effectively increases Knax, which 
improves the performance of WMRA. This observation is also 
consistent with the remarks after Theorem Q] In contrast, the 
social welfare under the greedy algorithm reaches saturation 
when s l;lnax > 0.7s JjCap . 

In Fig. 2J we show the performance of WMRA with 
different values of V as [0.2, 0.4, 0.6, 0.8, 1, 1.5, 2]V max . As 
expected, the social welfare grows as the value of V grows. 
Particularly, WMRA outperforms the greedy algorithm even 
with V = 0.2 Vmax. From Lemma [6] we know that the energy 
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i, max 

n Energy state: V 

B 3J max 

Energy state: 1 .5V 

— 9 — 3J m 

t Energy state: 2V 



100 200 300 400 500 600 700 800 900 1000 

Time slot 



100 200 300 400 500 600 700 800 900 1000 

Time slot 



Fig. 2. Time-averaged social welfare with V = V m 



Fig. 5. Sample path of a Type I EV's energy state with V = [1, 1.5, 2] V m 





0.04 
0.02 



i,up1 

_ Degradation cost: 0.2V m 
_ Degradation cost: V 



100 200 300 400 500 600 700 800 900 1000 

Time slot 



Fig. 3. Time-averaged social welfare with various Si, max and V = Vrr 



Fig. 6. Sample path of a Type I EV's time-averaged degradation cost with 
V = 0.2V max and V maJ£ . 




Fig. 4. Time-averaged social welfare with various values of V. 



state of each EV is guaranteed to be within [si, m i n , Si, m ax] 
when V G [0, V max \. In Fig.|5l for V being V max , 1.5V max , and 
2Knax, we show the evolutionary behaviours of a Type I EV's 
energy state under WMRA. We see that, when V = V max , the 
energy state is always within the preferred range; in contrast, 
when V = 1.5V max or 2V max , the associated energy state can 
exceed the preferred range from time to time. Furthermore, 



the larger V the more frequently such violation happens. 
Therefore, the observations in Figs. |4] and [5] demonstrate the 
significance of V max in achieving the maximum social welfare 
under WMRA considering the restriction of EV's battery 
capacity. 

In Fig. [6] we display the time-averaged degradation cost of 
a Type I EV under WMRA with V = 0.2V miiX and V max , 
respectively. We see that, first, for both values of V, the 
average degradation cost approaches the upper bound Ci up 
gradually, which conforms to the feasibility conclusion in 
Theorem |2] 1) that, for each EV, the long-term constraint of 
the battery degradation cost (|7} holds; second, the average 
degradation cost with smaller V arrives at the upper bound 
sooner. This second observation together with the observation 
in Fig. |4] demonstrate the role of V in the trade-off between the 
objective and the constraints of P3. Hence, in practice, if the 
aggregator needs to satisfy the constraint of the degradation 
cost within a finite operational time, it can use a smaller 
V, sacrificing some social welfare. Alternatively, it can also 
employ an operational upper bound, which is smaller than 
the actual upper bound, to ensure that the degradation cost 
constraint is met within finite time. For example, suppose that 
in WMRA we set Ci. up to be only 90% of the actual upper 
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bound c;. up i, as indicated in Fig. [6] Then, the actual upper 
bound Ci upi can be achieved under WMRA with V = V mayi 
when T '= 1000. 

VI. Conclusion 

We studied a practical model of a dynamic aggregator- 
EVs system providing regulation service to a power grid. We 
formulated the regulation allocation optimization as a long- 
term time-averaged social welfare maximization problem. Our 
formulation accounts for random system dynamics, battery 
constraints, the costs of battery degradation and external 
energy sources, and especially, the dynamics of EVs. Adopting 
a general Lyapunov optimization framework, we developed 
a real-time WMRA algorithm for the aggregator to fairly 
allocate the regulation amount among EVs. The algorithm does 
not require any knowledge of the statistics of the system state. 
We were able to bound the performance of WMRA to that 
under the optimal solution, and showed that the performance 
of WMRA is asymptotically optimal as EVs' battery capacities 
go to infinity. Simulation demonstrated that WMRA offers 
substantial performance gains over a greedy algorithm that 
maximizes per-slot social welfare objective. 

Appendix A 
Proof of Lemma 1 

It is easy to see that (xj t ,x* { ) is feasible for PI. To show 
that (x^, x° pt t , z° pt ) is feasible for P2, it suffices to show that 
z° pt satisfies (O and ( fTob . Using the definition of z°\, (TTOb 
naturally holds. Also, since x° pt lies in [0, Xj, m ax]> which is a 
closed interval, (0 holds. 

We claim that 



= / 2 (x^,x° u pt t , z r) 

< /2(Xd, t ,<,tX) 

< A( x d, f , x u.t) 



(23) 



Using the definition of z i t in /2O), the first equality holds. 
The first and the third inequalities hold since (xj t , x* t , z* ) 
and (x^,x° pt t ) are optimal for /2G) and /i(-), respectively. 
The second inequality is derived using Jensen's inequality for 
concave functions. Since ( 1231 is satisfied with equality, all 
inequalities in (l23l turn into equalities, which indicates the 
equivalence of PI and P2. 

Appendix B 
Proof of Lemma|2] 

Let T be large enough. For the i-th EV, decompose the total 
regulation amount within T — 1 time slots as 

-1 T-l 

h, t + b '^' (24) 



T-l 

E 

t=o 



E 

t=0 



t=t«l 



where k*=rii(T - 1). On the right hand side of d24l i. the first 
term corresponds to the total regulation amount before the last 
leaving time until time T — 1, and the second term corresponds 
to the rest total regulation amount. 



To show (fT3] l. it suffices to show that 



T-l 



limT-yoo yEEt=<r b ht] and limT^oo y E E 
both equal zero. Note that bi, t = when L. t = 0. Together 
with the boundedness of s^.t, it is not difficult to see that 
the latter one equals zero. We now show that the former one 
also equals zero. Based on (0, the first term in (l24l can be 
expressed as 



E hi * = E Si >*«.<= ~ E Si >*-,*, 

t=0 fc=l fc=l 

fc*-l 
fe=l 



Consider the first term on the right hand side of 
have 



(25) 
and we 



< 



X -1 



<E 



<E 



E A ^ k 

fc=i 
^ *;*— 1 

if E A ^ fc 



k=l 



m{T-i)-i 



nt(T-l) - 



— V A 

-1-1 ^ 



k=l 



(26) 



where in (1261 1 we have replaced k* with its definition 
and the inequality holds since m(T — 1) < T — 1. Us- 



ing the assumptions that lirriif_ 



E 



± V K A- 



and lirrit-^oo rii(t) = 00, from 



we have 



limT^oo E 
fore, 



1 v^ni(.J 

-D-1 Z^fe=i 



(T-l)-l 



= 0, and there - 



lim — E 



k*-l 



E ^ 



k=l 



= 0. 



(27) 



Taking expectations of both sides of d25l l. dividing them by 
T, then taking limits gives 



lim -E 



E b ^ 



t=0 



- lim — E 



lim -E 



k'-l 



S i,tu„ 



E A 
0. 



fe=i 



where the last equality is derived by using 
boundedness of s^t. This completes the proof. 



Appendix C 
Proof of PropositionQ] 

Based on the definition of L(& t ), 



and the 



L(©t+i) - L(® t ) 
1 N 

i=l 



I 2 

l i,t+l 



J 2 

J i,t+1 



K 2 

i,t+l 



H 



K 



-2 



(28) 
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In d28l ), Hf t+1 — H 2 t and Jf t+1 — Jf t can be upper bounded 
as follows. 



H i,t+1 



< 2Hi t (z it t-Xit) + x i , 



(29) 



j2 _ jZ 
J i,t+1 J Lt 



< 2Ji l t[ld,tCi(xid,t) + lu,tCi(xiu,t) - Ci 



up J 



~H C i,upi ( c i,max 

Kit + bit, 



21 + 



Now consider K 2 t+1 — K 2 t . When l i]t 



^ t < 2K iit b i>t 



(30) 

1, since li^t+i = 

.mix " 

(31) 



L i,t+1 

When li. t = 0, we have b^t — 0. Particularly, for 

t e {tu,k,Ul,k + !,-•• ,*ir,k+i - 2},Vfc G {1,2, •••}, and 



t < tir t \ — 2 (if such t is feasible), there is Ki, t +i 
we can express 



K 2 



A" 



2K,_ t b, 



For t = t iT . jfc - 1, Vfc = {1, 2, • • • }, there is Kf t+1 
{Si,t irik ~ c-) 2 - Kh < cf, so, 



Ki,t, so, 



(32) 



K 2 - 



K 2 < 2K Lt b Lt + c? 



Combining (ED, 423, and (J33>, 

K 2 t+1 ~K 2 t < 2K i>t b i>t + [x% 

Imposing the upper bounds (|29l , (l30l l. and ( [34-b on the right 
hand side of (l28l i. taking the conditional expectation of both 



c 2 l + 

,ni ax ' % J 



(33) 
(34) 



sides, then subtracting the term VI 



Si=i UiU(zi,t) - e t \& t 



gives the upper bound in Proposition Q] 

Appendix D 
Proof of Lemma[4] 

We need the following lemma. 

Lemma 7: Under the WMRA algorithm, queue backlog 
Hi t associated with the i-th EV is upper bounded as follows: 

Hi,t < Vuitfi + Xi 

.max- 

Proof: This can be shown using a similar method as in 
|[T6l . and the technical condition ^ is needed. ■ 

1) Consider Gt > 0. Suppose that when K^ t > £i,max + 
V(uJiii + e max ), one optimal solution under WMRA is Xd,t 
with iid.t > 0. Then we show that we can find another solution 
with Xjd.t^j 7^ i and Xid.t = resulting in a strictly smaller 
objective value, which is a contradiction. 

Using the objective function of bl, this is equivalent to show 



Ve s 



N 



G t - y] Xidj 



N 



2_j Hi,t%id,i 



N 



N 



^ Ji,tCi(x id ,t) + Ki^Xid,, 



> Ve s 



N 



y jj,tCj(xjd,t 



y Xid,t + Xid,t 
i=l 



which is equivalent to 

- Hi^XidA + JiCi(xid,t) + Ki,tXid.t > Ve s>t x id ,t- (35) 
Since JiCi(xid,t) > 0, from Q51 >. it suffices to show that 

(K it t - H ht - Ve s , t )x ld ,t > 0. (36) 



Since iid.t > 0, (1361 1 is true by using the assumption that 
Ki t t > aii,max + V(cJifj, + e max ) and Lemma [7] in which Hi t 
is upper bounded. 

2) Consider Gt < 0. Suppose that when K^t < — £i,max — 
V(u>iii + emax), one optimal solution under WMRA is x Uit 
with £i U .t > 0. Then there is a contradiction since we can 
construct another solution with Xj Uy t,Vj ^ i and Xi Uyt = 
which results in a strictly smaller objective value. The proof 
is similar as that in 1) and is omitted here. 

Appendix E 
Proof of TheoremQ] 

We first give the following fact, which is a direct conse- 
quence of the results in |[T6l . 

Lemma 8: There exists a stationary randomized regulation 
allocation solution (x| t ,x^ t ) that only depends on system 
state At, and there are 



Efef 



= zf,Vi, for some zf G [0,x itmax ], (37) 
N 

), (38) 



E[ld,tCi(x s idtt ) + lu,tCi(x s iu<t )] < Ci, m Vi, and (39) 
E[6? t ]=0,Vi, (40) 

where the expectations are taken over the randomness of the 
system and the randomness of (x^j t , x* t ), and (xd,t,x Ui t, it) 
is an optimal solution for P3. 

1) For brevity, define Wt= (y2iLi WiU (zij)^ — et- Since 
WMRA minimizes the upper bound in (l22l . plug (x^ t , x* t ) 
on the right hand side of d22l together with Za = zf,Mt, we 
have 



A(0 t ) - VE 



W t \@t 



< B-U/ 2 (x^ t ,x M ,Zt), (41) 



where d3T>, d38]l, d39]l, and (gOll are used. Since W t < 



£ ), from (14Tb . 



JV 



A(0 4 ) < L»=B + U ^ w<i7(a 



/2(xd,t,X Mj i,Z t 



Using Theorem 4.1 in [ 16], E[|i/j.t|] and E[|Ji jt |] are upper 
bounded by ^2tD + 2L(&o), Vt. Hence, the virtual queues 
Hi t t and J,^ are mean rate stable and the following limit 
constraints hold. 

T-l . T-l 

t 1 ^ t ^ E ^.*l = ^ r E E ^*] = Vi > ( 42 ) 



1 



t=0 

T-l 



t=0 



lim — y ' E [ld.tC^iid^) + l u ,tCj(x ili:t )] < Cj, up ,Vi 

Since Si,t is bounded under WMRA by Lemma |6] using 
Lemma [2] we have limT-)-oo ^ Ym=o ^\Pi,t] = 0,Vi. In 
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addition, note that (x^t, x u . t ) is derived under the constraints 
of the optimization problems (a), (bl), and (b2). Therefore, 
we have that (x^t, x u .t) is feasible for P3, P2, and PI. 

2) Taking expectations of both sides of < f4TT > and summing 
over t S {0, 1, • • • , T — 1} for some T > 1, we have 



1 



T-l 



> 



t=0 



E[L(e t )-z(e )] 

FT 



+ h{±d,t,Xu,t,%t) - B/V 



>f2(&d,t,*u,t,Zt) ~ B/V -E[L(® )]/VT, (43) 
where (l43~b holds since L(& t ) is non-negative. Also, 



T-l 



T-l 



t=0 



t=0 



AT 



\i=l 
T-l 



T-l 



JV / T-l \ 1 T-l 



t=0 



t=0 



(44) 



where the inequality in (l44l is derived using Jensen's inequal- 
ity for concave functions. Combining (l43l and d44l i and taking 
limits on both sides, there is 



N 



T-l 



i=l \ t=0 

>/2(Xd,t,X„, t ,Zt) - B/V 

>/ 2 (x* it ,x; t ,z*)-s/y 

=/ 1 (xf tJ x*)--B/V > 



T-l 



lim ^E E [ p: *] 



t=0 



(45) 
(46) 
(47) 



where (x^ t , x* t , z^) and (x^ p J,x° pt t ) are defined in Section 
IIII-AI (f43b holds since E[L(0 o )] is bounded, ___> holds since 
the feasible set of the optimization variables is enlarged from 
P2 to P3, and (|47| | is true due to Lemma __ 

Rewrite the objective function of PI under WMRA, i.e., 
/i(x rfjt ,x„, 4 ), as 



T-l 



i=l \ {=0 

A 7 / T-l 

i=l \ t=0 

AT / T-l 

i=l \ t=0 



1 T_1 

- lim - V E[e t l 
t=o 



Due to d42l , the last two terms cancel each other. Hence, by 
07), we have /i(xd,t,x„,t) > /i(x^,x° pt t ) - B/V, which 
completes the proof. 
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